272 research outputs found
Regression with Linear Factored Functions
Many applications that use empirically estimated functions face a curse of
dimensionality, because the integrals over most function classes must be
approximated by sampling. This paper introduces a novel regression-algorithm
that learns linear factored functions (LFF). This class of functions has
structural properties that allow to analytically solve certain integrals and to
calculate point-wise products. Applications like belief propagation and
reinforcement learning can exploit these properties to break the curse and
speed up computation. We derive a regularized greedy optimization scheme, that
learns factored basis functions during training. The novel regression algorithm
performs competitively to Gaussian processes on benchmark tasks, and the
learned LFF functions are with 4-9 factored basis functions on average very
compact.Comment: Under review as conference paper at ECML/PKDD 201
Predicting Fluid Intelligence of Children using T1-weighted MR Images and a StackNet
In this work, we utilize T1-weighted MR images and StackNet to predict fluid
intelligence in adolescents. Our framework includes feature extraction, feature
normalization, feature denoising, feature selection, training a StackNet, and
predicting fluid intelligence. The extracted feature is the distribution of
different brain tissues in different brain parcellation regions. The proposed
StackNet consists of three layers and 11 models. Each layer uses the
predictions from all previous layers including the input layer. The proposed
StackNet is tested on a public benchmark Adolescent Brain Cognitive Development
Neurocognitive Prediction Challenge 2019 and achieves a mean squared error of
82.42 on the combined training and validation set with 10-fold
cross-validation. In addition, the proposed StackNet also achieves a mean
squared error of 94.25 on the testing data. The source code is available on
GitHub.Comment: 8 pages, 2 figures, 3 tables, Accepted by MICCAI ABCD-NP Challenge
2019; Added ND
Predictive gene lists for breast cancer prognosis: A topographic visualisation study
<p>Abstract</p> <p>Background</p> <p>The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists.</p> <p>Methods</p> <p>We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether <it>a-posteriori </it>two prognosis groups are separable on the evidence of the gene lists.</p> <p>A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset.</p> <p>Results</p> <p>The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results.</p> <p>Conclusion</p> <p>The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers.</p> <p>However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses.</p> <p>We conclude that many of the patients involved in such medical studies are <it>intrinsically unclassifiable </it>on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.</p
Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping
<p>Abstract</p> <p>Background</p> <p>The Bayesian shrinkage technique has been applied to multiple quantitative trait loci (QTLs) mapping to estimate the genetic effects of QTLs on quantitative traits from a very large set of possible effects including the main and epistatic effects of QTLs. Although the recently developed empirical Bayes (EB) method significantly reduced computation comparing with the fully Bayesian approach, its speed and accuracy are limited by the fact that numerical optimization is required to estimate the variance components in the QTL model.</p> <p>Results</p> <p>We developed a fast empirical Bayesian LASSO (EBLASSO) method for multiple QTL mapping. The fact that the EBLASSO can estimate the variance components in a closed form along with other algorithmic techniques render the EBLASSO method more efficient and accurate. Comparing with the EB method, our simulation study demonstrated that the EBLASSO method could substantially improve the computational speed and detect more QTL effects without increasing the false positive rate. Particularly, the EBLASSO algorithm running on a personal computer could easily handle a linear QTL model with more than 100,000 variables in our simulation study. Real data analysis also demonstrated that the EBLASSO method detected more reasonable effects than the EB method. Comparing with the LASSO, our simulation showed that the current version of the EBLASSO implemented in Matlab had similar speed as the LASSO implemented in Fortran, and that the EBLASSO detected the same number of true effects as the LASSO but a much smaller number of false positive effects.</p> <p>Conclusions</p> <p>The EBLASSO method can handle a large number of effects possibly including both the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTL mapping.</p
Î-stochastic neighbour embedding for feed-forward data visualization
t-distributed Stochastic Neighbour Embedding (t-SNE) is one of the most popular nonlinear dimension reduction techniques used in multiple application domains. In this paper we propose a variation on the embedding neighbourhood distribution, resulting in Î-SNE, which can construct a feed-forward mapping using an RBF network. We compare the visualizations generated by Î-SNE with those of t-SNE and provide empirical evidence suggesting the network is capable of robust interpolation and automatic weight regularization
Direct estimation of wall shear stress from aneurysmal morphology: A statistical approach
Computational fluid dynamics (CFD) is a valuable tool for studying vascular diseases, but requires long computational time. To alleviate this issue, we propose a statistical framework to predict the aneurysmal wall shear stress patterns directly from the aneurysm shape. A database of 38 complex intracranial aneurysm shapes is used to generate aneurysm morphologies and CFD simulations. The shapes and wall shear stresses are then converted to clouds of hybrid points containing both types of information. These are subsequently used to train a joint statistical model implementing a mixture of principal component analyzers. Given a new aneurysmal shape, the trained joint model is firstly collapsed to a shape only model and used to initialize the missing shear stress values. The estimated hybrid point set is further refined by projection to the joint model space. We demonstrate that our predicted patterns can achieve significant similarities to the CFD-based results
ABCD Neurocognitive Prediction Challenge 2019: Predicting individual fluid intelligence scores from structural MRI using probabilistic segmentation and kernel ridge regression
We applied several regression and deep learning methods to predict fluid
intelligence scores from T1-weighted MRI scans as part of the ABCD
Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel
intensities and probabilistic tissue-type labels derived from these as features
to train the models. The best predictive performance (lowest mean-squared
error) came from Kernel Ridge Regression (KRR; ), which produced a
mean-squared error of 69.7204 on the validation set and 92.1298 on the test
set. This placed our group in the fifth position on the validation leader board
and first place on the final (test) leader board.Comment: Winning entry in the ABCD Neurocognitive Prediction Challenge at
MICCAI 2019. 7 pages plus references, 3 figures, 1 tabl
Radiocarbon dating of methane and carbon dioxide evaded from a temperate peatland stream
Streams draining peatlands export large quantities of carbon in different chemical forms and
are an important part of the carbon cycle. Radiocarbon (14C) analysis/dating provides unique
information on the source and rate that carbon is cycled through ecosystems, as has recently
been demonstrated at the air-water interface through analysis of carbon dioxide (CO2) lost
from peatland streams by evasion (degassing). Peatland streams also have the potential to
release large amounts of methane (CH4) and, though 14C analysis of CH4 emitted by ebullition
(bubbling) has been previously reported, diffusive emissions have not. We describe methods
that enable the 14C analysis of CH4 evaded from peatland streams. Using these methods, we
investigated the 14C age and stable carbon isotope composition of both CH4 and CO2 evaded
from a small peatland stream draining a temperate raised mire. Methane was aged between
1617-1987 years BP, and was much older than CO2 which had an age range of 303-521 years
BP. Isotope mass balance modelling of the results indicated that the CO2 and CH4 evaded
from the stream were derived from different source areas, with most evaded CO2 originating
from younger layers located nearer the peat surface compared to CH4. The study demonstrates
the insight that can be gained into peatland carbon cycling from a methodological
development which enables dual isotope (14C and 13C) analysis of both CH4 and CO2 collected
at the same time and in the same way
Effect of fulvic acids on lead-induced oxidative stress to metal sensitive Vicia faba L. plant
Lead (Pb) is a ubiquitous environmental pollutant capable to induce various morphological, physiological, and biochemical functions in plants. Only few publications focus on the influence of Pb speciation both on its phytoavailability and phytotoxicity. Therefore, Pb toxicity (in terms of lipid peroxidation, hydrogen peroxide induction, and photosynthetic pigments contents) was studied in Vicia faba plants in relation with Pb uptake and speciation. V. faba seedlings were exposed to Pb supplied as Pb(NO3)2 or complexed by two fulvic acids (FAs), i.e. Suwannee River fulvic acid (SRFA) and Elliott Soil fulvic acid (ESFA), for 1, 12, and 24 h under controlled hydroponic conditions. For both FAs, Pb uptake and translocation by Vicia faba increased at low level (5 mg lâ1), whereas decreased at high level of application (25 mg lâ1). Despite the increased Pb uptake with FAs at low concentrations, there was no influence on the Pb toxicity to the plants. However, at high concentrations, FAs reduced Pb toxicity by reducing its uptake. These results highlighted the role of the dilution factor for FAs reactivity in relation with structure; SRFA was more effective than ESFA in reducing Pb uptake and alleviating Pb toxicity to V. faba due to comparatively strong binding affinity for the heavy metal
R-Gada: a fast and flexible pipeline for copy number analysis in association studies
<p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.</p> <p>Results</p> <p>Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.</p> <p>Conclusions</p> <p>The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.</p
- âŠ